NYC Traffic Accidents

Motor vehicle collisions reported by the New York City Police Department from January-August 2020. Each record represents an individual collision, including the date, time and location of the accident (borough, zip code, street name, latitude/longitude), vehicles and victims involved, and contributing factors.

Data Source : https://www.mavenanalytics.io/data-playground

----Load data----

---- Start the data cleaning----

We need to make sure the data is clean before starting your analysis. As a reminder, we should check for:

Duplicate Records

How many duplicate transaction records are there?

Drop Duplicate Records

Drop the duplicated records.

Consistent formatting

Missing Values

How many missing values are there?

Rename column name

Data cleaning finished!! 👏 Now! We are ready to answer questions and draw conclusions using our data. 👌 🍀

----Finish the data cleaning----

Time analysis

How many accidents were registered by the police in New York City in 2020 by date?

Number of accident per date (Jan-Aug 2020)

Compare the % of total accidents by month.

As we can observe, the number of accidents decreases since the end of February. One reason could be that fewer people are driving to work in these months during to COVID.

Break down accident frequency by day of week and hour of day. Based on this data, when do accidents occur most frequently?

As we did with months, we can analyze the distribution of car accidents according to the time and week by using a bar plot as well.

As we can observe in the plot, the greater number of accidents occur in early-morning hours 14–18. Accidents tend to be more severe in the evening.

As shown in the plot above, the number of car accidents decrease at the weekend. Weekdays present around an average of 280-300 car accidents per day, around 40 more accidents than on weekends.

Type of accident analysis

The data we are analyzing contains information related to (1) victims,(2) Contribution Factor, (3)vehicles. Regarding the type of accident the data frame includes information such as the number of injuries and killed,the contribution factor of the accident and the vehicles type involved in the accident.

Number of injuries and killed

The data frame includes information about how many victims were injuries and killed in each car accident. We can easily represent the percentage of injuries and kill using a pie plot as follows:

The plot shows that 73% of the accident did not have victims injured and 21% of accident have one victims injured.

Less than 1% of the accident have victims killed in the car accident.

On which particular street were the most accidents reported? What does that represent as a % of all reported accidents?

Top 10 street reported accidents (Jan - Aug 2020)

What was the most common contributing factor for the accidents reported in this sample (based on Vehicle 1)? What about for fatal accidents specifically?

Top 10 contribution factor for accidents (Jan - Aug 2020)

Top 10 vehicle involve in accidents (Jan - Aug 2020)

Type of accident analysis — conclusions

(1) In most accidents, Sedan,Station Wagon/Sport Utility, or Taxi vehicles were involved. Nearly half of the accident involve Sedan

(2) Most of the accident did not have victims injured in car accidents in 2020 (73%).

(3) Accidents tend to be more severe during night, late-evening, and weekends.

(4) Driver Inattention/Distraction is the main reason for the car accident (around 25%).

Location analysis

The density map of the car accident in NYC (Jan-Aug 2020)

The best way to analyze spacial data is by using maps. Folium is a python library that helps you create several types of Leaflet maps. We can easily generate a map of New York City, creating a Folium Map object. The location argument allows to center the map in a specific location (in our case New York City). We can also provide an initial zoom level into that location to zoom the map into the center.

Looking at the above timeline, we can observe how the number of accidents per month

Thanks for reading!!! 😊 🍀